Efficient Seeding Techniques for Protein Similarity Search

نویسندگان

  • Mikhail A. Roytberg
  • Anna Gambin
  • Laurent Noé
  • Slawomir Lasota
  • Eugenia Furletova
  • Ewa Szczurek
  • Gregory Kucherov
چکیده

We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Sequence Ensembles for Seeding Alignments of MinION Sequencing Data

Sequence similarity search is in bioinformatics often solved by seed-and-extend heuristics: we first locate short exact matches (hits) by hashing or other efficient indexing techniques and then extend these hits to longer sequence alignments. Such approaches are effective at finding very similar sequences, but they quickly loose sensitivity when trying to locate weaker similarities. In this pap...

متن کامل

Subset Seed Extension to Protein BLAST

A bstract: The seeding technique became central in the theory of sequence alignment and there are several efficient tools applying seeds to D N A homology search. Recently, a concept of subset seeds has been proposed for similarity search in protein sequences. We experimentally evaluate the applicability of subset seeds to protein homology search. We advocate the use of multiple subset seeds de...

متن کامل

Computer Aided Molecular Modeling Of Membrane Metalloprotease

Molecular modeling is a set of computational techniques for construction of 3D structure of a protein especially membrane bound proteins whose structures can not be elucidated using experimental techniques. These techniques has been applied in the study of membrane metalloproteases for comparing wild and mutated enzymes, docking inhibitors in the catalytic site and examination of binding pocket...

متن کامل

Languages of lossless seeds

Several algorithms for similarity search employ seeding techniques to quickly discard very dissimilar regions. In this paper, we study theoretical properties of lossless seeds, i.e., spaced seeds having full sensitivity. We prove that lossless seeds coincide with languages of certain sofic subshifts, hence they can be recognized by finite automata. Moreover, we show that these subshifts are ful...

متن کامل

PGR: A Graph Repository of Protein 3D-Structures

Graph theory and graph mining constitute rich fields of computational techniques to study the structures, topologies and properties of graphs. These techniques constitute a good asset in bioinformatics if there exist efficient methods for transforming biological data into graphs. In this paper, we present Protein Graph Repository (PGR), a novel database of protein 3D-structures transformed into...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008